[Anthropic] Support system role messages inside messages array by chaunceyjiang · Pull Request #44283 · vllm-project/vllm

chaunceyjiang · 2026-06-02T06:19:33Z

Purpose

[Anthropic] Support system role messages inside messages array

FIX #44000

Test Result

before

after

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Co-Authored-By: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-Authored-By: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

aleksandaryanakiev · 2026-06-02T08:25:57Z

This looks better, I'm closing my PR as it's not needed anymore

chaunceyjiang · 2026-06-02T09:06:19Z

/cc @DarkLight1337 @sfeng33 PTAL.

sfeng33

LGTM, thanks!

…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>

felix0080 · 2026-06-05T01:29:40Z

+                        system_parts.append(block.text)
+
+        # System messages embedded inside the messages array
+        for msg in anthropic_request.messages:


@chaunceyjiang @aleksandaryanakiev @sfeng33 @andrew @potatosalad I'm a bit concerned about the system role fix. It seems like merging a mid-conversation system:role message into a single system message could cause issues with KV-cache hits. In multi-turn conversations, this would likely change the prefix, potentially hurting cache reuse.

Yes, I have also observed this issue. The fix here is not correct. I am trying a new solution.

@chaunceyjiang
OK, I also have an idea here. Later, I will prepare a Merge Request for you. You can check if it meets your requirements.

@chaunceyjiang @aleksandaryanakiev @sfeng33 @andrew
please check this merge request: https://github.com/vllm-project/vllm/pull/44602/changes

…ching PR vllm-project#44283 merged all inline system:role messages into a single leading system message, which changes the conversation prefix and breaks KV-cache hits in multi-turn dialogues. This fix keeps inline system messages at their original position: - Remove inline system extraction from _convert_system_message (only top-level system is handled there) - In _convert_messages, handle system messages with a dedicated _extract_system_text helper that strips billing headers and only emits the message if real content exists — avoiding the _convert_block / _convert_message_content path which does not strip billing headers and may omit the "content" key - Add tests for billing header stripping on inline system messages Unlike vllm-project#44048 which moves the same merge logic to the protocol layer, this approach fundamentally avoids the prefix-breaking merge entirely. Co-authored-by: Hermes Agent

felix0080 · 2026-06-05T03:07:02Z

I noticed the prefix caching concern discussed here. I opened #44602 with an alternative approach that preserves inline role: system messages at their original position instead of merging them into the leading system message, so the conversation prefix structure stays intact for KV-cache hits. This also handles x-anthropic-billing-header stripping consistently for both top-level and inline system messages. @chaunceyjiang

…ching PR vllm-project#44283 merged all inline system:role messages into a single leading system message, which changes the conversation prefix and breaks KV-cache hits in multi-turn dialogues. This fix keeps inline system messages at their original position: - Remove inline system extraction from _convert_system_message (only top-level system is handled there) - In _convert_messages, handle system messages with a dedicated _extract_system_text helper that strips billing headers and only emits the message if real content exists — avoiding the _convert_block / _convert_message_content path which does not strip billing headers and may omit the "content" key - Add tests for billing header stripping on inline system messages Unlike vllm-project#44048 which moves the same merge logic to the protocol layer, this approach fundamentally avoids the prefix-breaking merge entirely. Co-authored-by: Hermes Agent Signed-off-by: felix0080 <felix0080@users.noreply.github.com>

…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: JisoLya <523420504@qq.com>

…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>

huangyxi · 2026-06-08T08:49:53Z

Hello! I found that the current Anthropic streaming response does not include the "assistant" role, which does not align with the official Anthropic response format. As a result, some third-party applications fail to parse the response correctly. For example, this issue occurs in CloudCLI:

https://github.com/siteboon/claudecodeui/blob/dd77649053769b91886897ec64fbd1c15e7a7a75/server/modules/providers/list/claude/claude-sessions.provider.ts#L493

I suggest adding

role="assistant"

between the two lines below. I tested this change locally, and it fixed the issue without affecting the normal Claude Code experience:

vllm/vllm/entrypoints/anthropic/serving.py

Lines 657 to 658 in 469f3dc

    
           id=origin_chunk.id, 
        
           content=[],

Since I'm not familiar with the vLLM contribution process, and the issue I found appears to be related to this existing PR, I hope someone from the community can help implement or review this simple fix.

## What this PR does / why we need it? Backports vLLM PR #44283 via a vllm-ascend platform monkey patch for the pinned /vllm-workspace/vllm runtime. The patch accepts `role: system` entries in Anthropic Messages API `messages`, merges inline system content with the top-level `system` prompt, strips Claude Code billing headers in both places, and skips inline system entries when converting the remaining chat history. Fixes vllm-project/vllm#44000 Backports vllm-project/vllm#44283 ## Does this PR introduce _any_ user-facing change? Yes. Anthropic-compatible `/v1/messages` requests from newer Claude Code clients can include `role: system` messages inside the `messages` array without failing validation. ## How was this patch tested? - `pytest -q tests/ut/patch/platform/test_patch_anthropic_system_message.py` - `ruff check vllm_ascend/patch/platform/patch_anthropic_system_message.py tests/ut/patch/platform/test_patch_anthropic_system_message.py vllm_ascend/patch/platform/__init__.py vllm_ascend/patch/__init__.py` - `ruff format --check vllm_ascend/patch/platform/patch_anthropic_system_message.py tests/ut/patch/platform/test_patch_anthropic_system_message.py vllm_ascend/patch/platform/__init__.py vllm_ascend/patch/__init__.py` - vLLM version: v0.20.2 - vLLM main: vllm-project/vllm@9090368 Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>

…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

[Anthropic] Support system role messages inside messages array

5a51e5a

Co-Authored-By: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-Authored-By: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

chaunceyjiang requested review from AndreasKaratzas, DarkLight1337, NickLucche, aarnphm, mgoin and robertgshaw2-redhat as code owners June 2, 2026 06:19

mergify Bot added the frontend label Jun 2, 2026

This was referenced Jun 2, 2026

[Bugfix][Anthropic] Normalize Claude Code system messages #44048

Open

[Anthropic][Frontend] auto-extract system messages from messages array #43959

Closed

sfeng33 approved these changes Jun 2, 2026

View reviewed changes

sfeng33 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 2, 2026

sfeng33 enabled auto-merge (squash) June 2, 2026 16:20

sfeng33 merged commit ed9a752 into vllm-project:main Jun 2, 2026
51 checks passed

This was referenced Jun 3, 2026

[BugFix][Platform] Backport Anthropic inline system messages vllm-project/vllm-ascend#9920

Merged

[BugFix][v0.20.2rc] Backport Anthropic inline system messages vllm-project/vllm-ascend#9921

Closed

zhangshuoming990105 mentioned this pull request Jun 3, 2026

[Frontend] Report cache usage in Anthropic /v1/messages API #40912

Open

felix0080 reviewed Jun 5, 2026

View reviewed changes

felix0080 mentioned this pull request Jun 5, 2026

fix(anthropic): preserve inline system message position for prefix caching #44602

Open

sfeng33 mentioned this pull request Jun 5, 2026

[Usage]: Claude code does not work with vLLM #44576

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Anthropic] Support system role messages inside messages array#44283

[Anthropic] Support system role messages inside messages array#44283
sfeng33 merged 1 commit into
vllm-project:mainfrom
chaunceyjiang:anthropic_system_messages

chaunceyjiang commented Jun 2, 2026 •

edited

Loading

Uh oh!

aleksandaryanakiev commented Jun 2, 2026

Uh oh!

chaunceyjiang commented Jun 2, 2026

Uh oh!

sfeng33 left a comment

Uh oh!

Uh oh!

felix0080 Jun 5, 2026

Uh oh!

chaunceyjiang Jun 5, 2026

Uh oh!

felix0080 Jun 5, 2026

Uh oh!

felix0080 Jun 5, 2026

Uh oh!

felix0080 commented Jun 5, 2026 •

edited

Loading

Uh oh!

huangyxi commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

chaunceyjiang commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Result

Uh oh!

aleksandaryanakiev commented Jun 2, 2026

Uh oh!

chaunceyjiang commented Jun 2, 2026

Uh oh!

sfeng33 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

felix0080 Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

chaunceyjiang Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

felix0080 Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

felix0080 Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

felix0080 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huangyxi commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chaunceyjiang commented Jun 2, 2026 •

edited

Loading

felix0080 commented Jun 5, 2026 •

edited

Loading